One-dimensional system arising in stochastic gradient descent
نویسندگان
چکیده
Abstract We consider stochastic differential equations of the form $dX_t = |f(X_t)|/t^{\gamma} dt+1/t^{\gamma} dB_t$ , where f ( x ) behaves comparably to $|x|^k$ in a neighborhood origin, for $k\in [1,\infty)$ . show that there exists threshold value $ \,{:}\,{\raise-1.5pt{=}}\, \tilde{\gamma}$ $\gamma$ depending on k such if $\gamma \in (1/2, \tilde{\gamma})$ then $\mathbb{P}(X_t\rightarrow 0) 0$ and rest permissible values 0)>0$ These results extend discrete processes satisfy $X_{n+1}-X_n f(X_n)/n^\gamma +Y_n/n^\gamma$ Here, $Y_{n+1}$ are martingale differences almost surely bounded. This result shows function F whose second derivative at degenerate saddle points is polynomial order, it always possible escape via iteration =F'(X_n)/n^\gamma suitable choice
منابع مشابه
Stochastic Gradient Descent with Only One Projection
Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattrac...
متن کاملVariational Stochastic Gradient Descent
In Bayesian approach to probabilistic modeling of data we select a model for probabilities of data that depends on a continuous vector of parameters. For a given data set Bayesian theorem gives a probability distribution of the model parameters. Then the inference of outcomes and probabilities of new data could be found by averaging over the parameter distribution of the model, which is an intr...
متن کاملByzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the m machines which allegedly compute stochastic gradients every iteration, an α-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds ε-approximate minimizers of convex functions in T = Õ ( 1...
متن کاملParallelized Stochastic Gradient Descent
With the increase in available data parallel machine learning has become an in-creasingly pressing problem. In this paper we present the first parallel stochasticgradient descent algorithm including a detailed analysis and experimental evi-dence. Unlike prior work on parallel optimization algorithms [5, 7] our variantcomes with parallel acceleration guarantees and it poses n...
متن کاملPreconditioned Stochastic Gradient Descent
Stochastic gradient descent (SGD) still is the workhorse for many practical problems. However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to accelerate its convergence remarkably. But many attempts in this direction either aim at solving specialized problems, or result in significantly more complicated methods than SGD. This paper proposes a new method t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Advances in Applied Probability
سال: 2021
ISSN: ['1475-6064', '0001-8678']
DOI: https://doi.org/10.1017/apr.2020.10